Animation
6 science milestones turning 40 this year
In 1986, we had huge leaps forward, tragic steps back, and life changing innovations. NASA's STS-51L crew members pose for photographs during a break in countdown training at the White Room, Launch Complex 39, Pad B. Left to right are Teacher-in-Space payload specialist Sharon Christa McAuliffe; payload specialist Gregory Jarvis; and astronauts Judith A. Resnik, mission specialist; Francis R. (Dick) Scobee, mission commander; Ronald E. McNair, mission specialist; Mike J. Smith, pilot; and Ellison S. Onizuka, mission specialist. Breakthroughs, discoveries, and DIY tips sent every weekday. It was a year that saw roughly six million Americans hold hands in a continuous (more or less) line across the country to raise money for homelessness. A news anchor named Oprah Winfrey debuted her new talk show.
- North America > United States (0.51)
- Europe > Russia (0.16)
- Asia > Russia (0.16)
- (4 more...)
- Media > Film (1.00)
- Government (1.00)
- Energy > Power Industry > Utilities > Nuclear (0.95)
- Leisure & Entertainment > Games > Computer Games (0.71)
- Information Technology > Graphics > Animation (0.30)
- Information Technology > Artificial Intelligence > Games (0.30)
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic Tom and Jerry cartoon series. Cartoons use the principles of animation that allow animators to create expressive, unambiguous causal relationships between events to form a coherent storyline. Utilizing these properties, along with thought-provoking questions and multi-level answers (answer and detailed causal explanation), our questions involve causal chains that interconnect multiple dynamic interactions between characters and visual scenes. These factors demand models to solve more challenging, yet well-defined causal relationships. We also introduce hard incorrect answer mining, including a causally confusing version that is even more challenging. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling \& joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field.
- Information Technology > Artificial Intelligence (0.79)
- Information Technology > Graphics > Animation (0.59)
The Truth About the em Avatar /em Movies That No One Wants to Accept
James Cameron is desperate to convince the world that these movies aren't "cartoons." Enter your email to receive alerts for this author. You can manage your newsletter subscriptions at any time. You're already subscribed to the aa_Sam_Adams newsletter. You can manage your newsletter subscriptions at any time.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (0.47)
- Information Technology > Graphics > Animation (0.33)
- Information Technology > Communications > Social Media (0.30)
PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles
Han, Tianshun, Zhou, Benjia, Liu, Ajian, Liang, Yanyan, Zhang, Du, Lei, Zhen, Wan, Jun
PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Dual-Stream Emotion Extractor (DSEE) that captures both time and frequency-domain audio features for fine-grained emotion analysis, and an Emotional Style Modeling Module (ESMM) that models individual expression patterns based on voiceprint characteristics. To address data scarcity, the method leverages a newly constructed 3D-EmoStyle dataset. Evaluations demonstrate that PESTalk outperforms state-of-the-art methods in producing realistic and personalized facial animations.
- Asia > Macao (0.06)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Graphics > Animation (0.98)
- Information Technology > Artificial Intelligence > Vision (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
AnimAgents: Coordinating Multi-Stage Animation Pre-Production with Human-Multi-Agent Collaboration
Wang, Wen-Fan, Lu, Chien-Ting, Ng, Jin Ping, Chiu, Yi-Ting, Lee, Ting-Ying, Wang, Miaosen, Chen, Bing-Yu, Chen, Xiang 'Anthony'
Animation pre-production lays the foundation of an animated film by transforming initial concepts into a coherent blueprint across interdependent stages such as ideation, scripting, design, and storyboarding. While generative AI tools are increasingly adopted in this process, they remain isolated, requiring creators to juggle multiple systems without integrated workflow support. Our formative study with 12 professional creative directors and independent animators revealed key challenges in their current practice: Creators must manually coordinate fragmented outputs, manage large volumes of information, and struggle to maintain continuity and creative control between stages. Based on the insights, we present AnimAgents, a human-multi-agent collaborative system that coordinates complex, multi-stage workflows through a core agent and specialized agents, supported by dedicated boards for the four major stages of pre-production. AnimAgents enables stage-aware orchestration, stage-specific output management, and element-level refinement, providing an end-to-end workflow tailored to professional practice. In a within-subjects summative study with 16 professional creators, AnimAgents significantly outperformed a strong single-agent baseline that equipped with advanced parallel image generation in coordination, consistency, information management, and overall satisfaction (p < .01). A field deployment with 4 creators further demonstrated AnimAgents' effectiveness in real-world projects.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > Austria > Vienna (0.14)
- (10 more...)
- Workflow (1.00)
- Research Report > Experimental Study (1.00)
- Questionnaire & Opinion Survey (1.00)
- (3 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Human Computer Interaction > Interfaces (1.00)
- Information Technology > Graphics > Animation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- (3 more...)
Once Upon an AI: Six Scaffolds for Child-AI Interaction Design, Inspired by Disney
To build AI that children can intuitively understand and benefit from, designers need a design grammar that serves their developmental needs. This paper bridges artificial intelligence design for children - an emerging field still defining its best practices - and animation, a well established field with decades of experience in engaging children through accessible storytelling. Pairing Piagetian developmental theory with design pattern extraction from 52 works of animation, the paper presents a six scaffold framework that integrates design insights transferable to child centred AI design: (1) signals for visual animacy and clarity, (2) sound for musical and auditory scaffolding, (3) synchrony in audiovisual cues, (4) sidekick style personas, (5) storyplay that supports symbolic play and imaginative exploration, and (6) structure in the form of predictable narratives. These strategies, long refined in animation, function as multimodal scaffolds for attention, understanding, and attunement, supporting learning and comfort. This structured design grammar is transferable to AI design. By reframing cinematic storytelling and child development theory as design logic for AI, the paper offers heuristics for AI that aligns with the cognitive stages and emotional needs of young users. The work contributes to design theory by showing how sensory, affective, and narrative techniques can inform developmentally attuned AI design. Future directions include empirical testing, cultural adaptation, and participatory co design.
- North America > United States > New York (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
- Leisure & Entertainment (1.00)
- Media > Film (0.93)
- Information Technology (0.67)
- (3 more...)
Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation
Mao, Yuxiang, Zhang, Zhijie, Zhang, Zhiheng, Liu, Jiawei, Zeng, Chen, Xia, Shihong
Expressions are fundamental to conveying human emotions. With the rapid advancement of AI-generated content (AIGC), realistic and expressive 3D facial animation has become increasingly crucial. Despite recent progress in speech-driven lip-sync for talking-face animation, generating emotionally expressive talking faces remains underexplored. A major obstacle is the scarcity of real emotional 3D talking-face datasets due to the high cost of data capture. To address this, we model facial animation driven by both speech and emotion as a linear additive problem. Leveraging a 3D talking-face dataset with neutral expressions (VOCAset) and a dataset of 3D expression sequences (Florence4D), we jointly learn a set of blendshapes driven by speech and emotion. We introduce a sparsity constraint loss to encourage disentanglement between the two types of blendshapes while allowing the model to capture inherent secondary cross-domain deformations present in the training data. The learned blendshapes can be further mapped to the expression and jaw pose parameters of the FLAME model, enabling the animation of 3D Gaussian avatars. Qualitative and quantitative experiments demonstrate that our method naturally generates talking faces with specified expressions while maintaining accurate lip synchronization. Perceptual studies further show that our approach achieves superior emotional expressivity compared to existing methods, without compromising lip-sync quality.
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > Experimental Study (0.69)
- Research Report > New Finding (0.46)
- Information Technology > Graphics > Animation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Vision > Face Recognition (0.94)
Environment-aware Motion Matching
Ponton, Jose Luis, Andrews, Sheldon, Andujar, Carlos, Pelechano, Nuria
Interactive applications demand believable characters that respond naturally to dynamic environments. Traditional character animation techniques often struggle to handle arbitrary situations, leading to a growing trend of dynamically selecting motion-captured animations based on predefined features. While Motion Matching has proven effective for locomotion by aligning to target trajectories, animating environment interactions and crowd behaviors remains challenging due to the need to consider surrounding elements. Existing approaches often involve manual setup or lack the naturalism of motion capture. Furthermore, in crowd animation, body animation is frequently treated as a separate process from trajectory planning, leading to inconsistencies between body pose and root motion. To address these limitations, we present Environment-aware Motion Matching, a novel real-time system for full-body character animation that dynamically adapts to obstacles and other agents, emphasizing the bidirectional relationship between pose and trajectory. In a preprocessing step, we extract shape, pose, and trajectory features from a motion capture database. At runtime, we perform an efficient search that matches user input and current pose while penalizing collisions with a dynamic environment. Our method allows characters to naturally adjust their pose and trajectory to navigate crowded scenes.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (13 more...)
- Information Technology > Graphics > Animation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Seo, Junyoung, Mira, Rodrigo, Haliassos, Alexandros, Bounareli, Stella, Chen, Honglie, Tran, Linh, Kim, Seungryong, Landgraf, Zoe, Shen, Jie
Audio-driven human animation models often suffer from identity drift during temporal autoregressive generation, where characters gradually lose their identity over time. One solution is to generate keyframes as intermediate temporal anchors that prevent degradation, but this requires an additional keyframe generation stage and can restrict natural motion dynamics. To address this, we propose Lookahead Anchoring, which leverages keyframes from future timesteps ahead of the current generation window, rather than within it. This transforms keyframes from fixed boundaries into directional beacons: the model continuously pursues these future anchors while responding to immediate audio cues, maintaining consistent identity through persistent guidance. This also enables self-keyframing, where the reference image serves as the lookahead target, eliminating the need for keyframe generation entirely. We find that the temporal lookahead distance naturally controls the balance between expressivity and consistency: larger distances allow for greater motion freedom, while smaller ones strengthen identity adherence. When applied to three recent human animation models, Lookahead Anchoring achieves superior lip synchronization, identity preservation, and visual quality, demonstrating improved temporal conditioning across several different architectures. Audio-driven human animation aims to generate realistic human videos synchronized with input audio, with widespread applications in film production, virtual assistants, and digital content creation. The advent of Diffusion Transformers (DiTs) (Peebles & Xie, 2022) has significantly advanced this field, enabling natural human video generation not only for portrait videos but also in diverse environments with complex backgrounds (Xu et al., 2024; Chen et al., 2025a). However, current DiT -based models can only handle short clips at a time, typically around 5 seconds, due to the quadratic complexity of diffusion transformer architectures.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Graphics > Animation (0.76)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Multi-identity Human Image Animation with Structural Video Diffusion
Wang, Zhenzhi, Li, Yixuan, Zeng, Yanhong, Guo, Yuwei, Lin, Dahua, Xue, Tianfan, Dai, Bo
Generating human videos from a single image while ensuring high visual quality and precise control is a challenging task, especially in complex scenarios involving multiple individuals and interactions with objects. Existing methods, while effective for single-human cases, often fail to handle the intricacies of multi-identity interactions because they struggle to associate the correct pairs of human appearance and pose condition and model the distribution of 3D-aware dynamics. T o address these limitations, we present Structural Video Diffusion, a novel framework designed for generating realistic multi-human videos. Our approach introduces two core innovations: identity-specific embed-dings to maintain consistent appearances across individuals and a structural learning mechanism that incorporates depth and surface-normal cues to model human-object interactions. Additionally, we expand existing human video dataset with 25K new videos featuring diverse multi-human and object interaction scenarios, providing a robust foundation for training. Experimental results demonstrate that Structural Video Diffusion achieves superior performance in generating lifelike, coherent videos for multiple subjects with dynamic and rich interactions, advancing the state of human-centric video generation. Code is available at here.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Graphics > Animation (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)